Efficient Index for Weighted Sequences

نویسندگان

  • Carl Barton
  • Tomasz Kociumaka
  • Solon P. Pissis
  • Jakub Radoszewski
چکیده

The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alphabet a probability of occurrence of this letter at this position is given. Sequences of this type, also called position weight matrices, are commonly used to represent imprecise or uncertain data. A weighted sequence may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. Given a probability threshold 1 z , we say that a pattern string P matches a weighted text at position i if the product of probabilities of the letters of P at positions i, . . . , i+ |P |−1 in the text is at least 1 z . In this article, we present an O(nz)-time construction of an O(nz)-sized index that can answer pattern matching queries in a weighted text in optimal time improving upon the state of the art by a factor of z log z. Other applications of this data structure include an O(nz)-time construction of the weighted prefix table and an O(nz)-time computation of all covers of a weighted sequence, which improve upon the state of the art by the same factor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computation of the Sadhana (Sd) Index of Linear Phenylenes and Corresponding Hexagonal Sequences

The Sadhana index (Sd) is a newly introduced cyclic index. Efficient formulae for calculating the Sd (Sadhana) index of linear phenylenes are given and a simple relation is established between the Sd index of phenylenes and of the corresponding hexagonal sequences.

متن کامل

I-45: Advance MRI Sequences in Pelvic Endometriosis

Background: To assess MRI in diagnosing endometriotic lesions, emphasizing T2*weighted imaging efficacy. Materials and Methods: This prospective study of 48 females (22-38 years, average 29.6) clinically suspected of endometriosis from September 2009 to April 2012. MRI was performed with a 1.5 T imager (Siemens) with a body array coil. T1, T2 and T2* weighted (2D-FLASH) sequences were obtained ...

متن کامل

Indexing Weighted Sequences: Neat and Efficient

In a weighted sequence, for every position of the sequence and every letter of the alphabet a probability of occurrence of this letter at this position is specified. Weighted sequences are commonly used to represent imprecise or uncertain data, for example, in molecular biology where they are known under the name of Position-Weight Matrices. Given a probability threshold 1 z , we say that a str...

متن کامل

A Multi-Objective Particle Swarm Optimization for Mixed-Model Assembly Line Balancing with Different Skilled Workers

This paper presents a multi-objective Particle Swarm Optimization (PSO) algorithm for worker assignment and mixed-model assembly line balancing problem when task times depend on the worker’s skill level. The objectives of this model are minimization of the number of stations (equivalent to the maximization of the weighted line efficiency), minimization of the weighted smoothness index and minim...

متن کامل

The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications

In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a ve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016